This R notebook details the data processing and visualization for growth competition experiments with a CRISPRi sgRNA library. The library contains around 20,000 unique sgRNA repression mutants tailored for the cyanobacterium Synechocystis sp. PCC6803. This library is the second version (therefore “V2”) of an sgRNA library for Synechocystis, containing five instead of only two sgRNAs per gene. In some cases, genes or ncRNAs are so short that it is not possible to design a maximum of five individual sgRNAs.
The first iteration of the Synechocystis sgRNA library was published in Nature Communications, 2020.
Load required packages.
suppressPackageStartupMessages({
library(tidyverse)
library(ggrepel)
library(lattice)
library(latticeExtra)
library(latticetools)
library(scales)
library(dendextend)
library(vegan)
library(tsne)
library(KEGGREST)
library(limma)
library(corrplot)
library(kableExtra)
library(grid)
library(ggpubr)
})
Define global figure style, default colors, and a plot saving function.
Load raw data. The main table contains already normalized quantification of all sgRNAs, fold change, multiple hypothesis corrected p-values, and fitness score. Contrary to the processing of our first CRISPRi library V1, much of the functionality from the notebook was transferred into the new CRISPRi library pipeline on github.
# load first seq run
load("../data/input/DESeq2_result.Rdata")
df_main <- DESeq_result_table
# load second seq run
load("../data/input/DESeq2_result_2.Rdata")
df_main <- bind_rows(df_main, DESeq_result_table)
# remove single results table
rm(DESeq_result_table)
Different annotation columns are added to the main data frame, including a short sgRNA identifier (excluding the position on the gene), an sgRNA index (1 to 5), and genome annotation from Uniprot. The Uniprot data is dynamically downloaded for every update of this pipeline using their very simple API (read_tsv("https://www.uniprot.org/uniprot/?query=taxonomy:1111708&format=tab")). The full list of columns that can be queried is available here. Pathway annotation from KEGG is later in the pipeline added using the KEGGREST package.
df_main <- df_main %>%
# correct an error in sgRNA naming
mutate(sgRNA = gsub('”', '2', sgRNA)) %>%
# split sgRNA names into target gene and position
separate(sgRNA, into = c("sgRNA_target", "sgRNA_position"), sep = "\\|",
remove = FALSE) %>%
# add sgRNA index number (1 to maximally 5) and type
group_by(sgRNA_target) %>%
mutate(
sgRNA_position = as.numeric(sgRNA_position),
sgRNA_index = sgRNA_position %>% as.factor %>% as.numeric,
sgRNA_type = if_else(grepl("^nc_", sgRNA), "ncRNA", "gene")) %>%
ungroup %>%
# map trivial names to LocusTags using a manually curated list
left_join(
read_tsv("../data/input/mapping_trivial_names.tsv", col_types = cols()),
by = c("sgRNA_target" = "gene")) %>%
# remove some empty rows (NA targets)
filter(!is.na(sgRNA_target)) %>%
# remove 2 conditions without response
filter(!condition %in% c("BG11", "LC, 200uE")) %>%
# split condition into separate cols
separate(condition, into = c("carbon", "light", "treatment_1", "treatment_2"),
sep = ", ", remove = FALSE, fill = "right") %>%
unite("treatment", treatment_1, treatment_2, sep = ", ", na.rm = TRUE)
Overview about the different conditions.
df_cultivation_summary <- df_main %>% group_by(condition) %>%
summarize(
time_points = paste(unique(time), collapse = ", "),
carbon = unique(carbon),
light = unique(light),
treatment = unique(treatment),
min_fit = min(fitness),
med_fit = median(fitness),
max_fit = max(fitness))
print(df_cultivation_summary)
write_csv(df_cultivation_summary, file = "../data/output/cultivation_summary.csv")
Retrieve gene info from uniprot and merge with main data frame. We need to make a custom function to retrieve and parse the data from uniprot, because of a bug in the security level on Ubuntu 20.04. The fallback option is to load a local copy of uniprot annotation for this organism.
library(httr)
uniprot_url <- paste0(
"https://www.uniprot.org/uniprot/?query=taxonomy:1111708&format=tab&",
"columns=id,genes,genes(PREFERRED),protein_names,length,mass,ec,database(KEGG)")
get_uniprot <- function(url) {
# reset security level, caused by a faulty SSL certificate on server side,
# see this thread: https://github.com/Ensembl/ensembl-rest/issues/427
httr_config <- config(ssl_cipher_list = "DEFAULT@SECLEVEL=1")
res <- with_config(config = httr_config, GET(url))
server_error = simpleError("")
df_uniprot <- tryCatch(
read_tsv(content(res), col_types = cols()),
error = function(server_error) {
message("Uniprot server not available, falling back on local Uniprot DB copy")
read_tsv("../data/input/uniprot_synechocystis.tsv", col_types = cols())
}
)
}
df_uniprot <- get_uniprot(uniprot_url) %>%
rename_with(tolower) %>%
rename(locus = `cross-reference (kegg)`, gene_name = `gene names`,
gene_name_short = `gene names (primary )`, ec_number = `ec number`,
protein = `protein names`, uniprot_ID = entry
) %>%
separate_rows(locus, sep = ";syn:") %>%
mutate(locus = str_remove_all(locus, "syn:|;")) %>%
filter(!is.na(locus))
df_main <- left_join(df_main, filter(df_uniprot, !duplicated(locus)),
by = "locus")
Each gene is represented by up to five sgRNAs. We can test if all or only some of the 5 sgRNAs are “behaving” in the same way in the same conditions, more mathematically speaking we can estimate the correlation of every sgRNA with another. First let’s summarize how many genes have 5, 4, 3 sgRNAs and so on associated with them.
# N unique sgRNAs in dataset
paste0("Number of unique sgRNAs: ", unique(df_main$sgRNA) %>% length)
[1] "Number of unique sgRNAs: 21705"
# N genes with 1,2,3,4 or 5 sgRNAs
plot_sgRNAs_per_gene <- df_main %>%
group_by(sgRNA_type, sgRNA_target) %>%
summarize(n_sgRNAs = length(unique(sgRNA_position)), .groups = "drop_last") %>%
count(n_sgRNAs) %>% filter(n_sgRNAs <= 5) %>%
ggplot(aes(x = factor(n_sgRNAs, 5:1), y = n, label = n)) +
geom_col(show.legend = FALSE) +
geom_text(size = 3, nudge_y = 200, color = grey(0.5)) +
facet_grid(~ sgRNA_type) +
labs(x = "n sgRNAs / target", y = "n targets") +
coord_cartesian(ylim = c(-50, 3500)) +
custom_theme()
print(plot_sgRNAs_per_gene)
save_plot(plot_sgRNAs_per_gene, width = 6, height = 3.5)
Before biological analysis continues, we need to check if fitness (and log2 FC from which it is calculated) is equally distributed. For example, strictly essential genes like ribosomal genes should show the same degreee of depletion over time, regardless of condition.
We can compare fitness over all conditions using a scatter plot matrix. We can see that some conditions are very similar to each other, for example the conditions treated with glucose (LC, LL +g, LC, LL, +D, +G, HC, LL +g). Others are more dissimilar to the rest, for example LC, IL and LC, LL, +FL. They are more alike each other, although LC, LL, +FL should be more comparable to LC, LL, hinting at experimental bias. In this case both of these conditions (and LC, LL, +G) were pre-cultivated in low light instead of high light, as opposed to the rest of the samples.
df_main %>% filter(time == 0, sgRNA_index == 1) %>%
select(locus, condition, fitness) %>%
filter(!is.na(locus)) %>%
pivot_wider(names_from = condition, values_from = fitness) %>%
select(-locus) %>%
custom_splom(pch = 19, cex = 0.3, col = grey(0.4, 0.4), pscales = 0)
In order to account for experimental or quantification bias, we can try to normalize the log2 FC distribution between all samples, and then re-calculate fitness. The underlying assumption is that e.g. essential genes should deplete at the same rate and hence show identical log2 FC at identical time points. Different types of experimental bias influence global fitness distribution and should be reduced with normalization. Here we try a ‘cyclic loess’ or quantile normalization that gave good results in a quick comparison.
# construct a normalization function that takes three colums as input,
# the numeric variable to be normalized, the conditioning variable
# (character or factor), and an ID that identifies each observation (sgRNA)
apply_norm = function(id, cond, var) {
df_orig <- tibble(id = id, cond = cond, var = var)
df_new <- pivot_wider(df_orig, names_from = cond, values_from = var) %>%
column_to_rownames("id") %>% as.matrix %>%
limma::normalizeBetweenArrays(method = "quantile") %>%
as_tibble(rownames = "id") %>%
pivot_longer(-id, names_to = "cond", values_to = "var_norm")
left_join(df_orig, df_new, by = c("id", "cond")) %>% pull(var_norm)
}
# apply normalization
df_main <- df_main %>%
mutate(FoldChange = 2^log2FoldChange) %>%
group_by(time) %>%
mutate(
FoldChange_norm = apply_norm(sgRNA, condition, FoldChange),
log2FoldChange = log2(FoldChange_norm)
) %>% ungroup
# compare effect of normalization
df_main %>% group_by(condition) %>% slice(1:10000) %>%
ggplot(aes(x = log2(FoldChange), y = log2(FoldChange_norm), color = factor(time))) +
geom_point(size = 0.5) +
facet_wrap(~ condition, ncol = 4) +
custom_theme() +
scale_color_manual(values = custom_colors)
Another way to look at the result of the normalization is to compare the global distribution of log2 FC values, as a density plot.
library(ggridges)
df_main %>% filter(time == 10) %>%
select(sgRNA, condition, FoldChange, FoldChange_norm) %>%
pivot_longer(matches("^Fold"), names_to = "metric", values_to = "FC") %>%
distinct %>%
ggplot(aes(x = log2(FC), y = condition, group = condition)) +
geom_density_ridges(fill = "#00AFBB99", col = grey(0.4)) +
facet_wrap(~ metric, ncol = 4) +
lims(x = c(-2, 1.5)) +
custom_theme()
Picking joint bandwidth of 0.0306
Picking joint bandwidth of 0.0312
Now we need to re-calculate fitness based on the normalized log2 FC.
df_main <- df_main %>%
select(-FoldChange, -FoldChange_norm) %>%
group_by(sgRNA, condition) %>%
mutate(fitness = DescTools::AUC(time, log2FoldChange)/(max(time)/2)) %>%
arrange(sgRNA_target, sgRNA_index, condition, time)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Different methods can be used to estimate similarity between samples (sgRNAs). For example, factor analysis is a method to dissect underlying sources of variation within the dataset, and the contribution to overall variation. The most famous example is principal component analysis (PCA). We can also use the correlation coefficient of sgRNAs to each other to see if one of the sgRNAs contributes stronger to overall variation.
This is an example of an apparently strictly essential gene, encoding the ribosomal protein rps10. Most of the sgRNA repressor strains are depleted, the correlation between sgRNAs is high. The strength of depletion varies though, and the strain with sgRNA 3 is not depleted at all. We want to give higher weights to sgRNAs that correlate well with each other, and/or show stronger effect (depletion/enrichment).
plot_sgRNA_ribo_example <- df_main %>% filter(sgRNA_target == "rps10") %>%
mutate(sgRNA_index = factor(sgRNA_index, 1:5)) %>%
ggplot(aes(x = time, y = log2FoldChange, color = sgRNA_index)) +
geom_line(size = 1) + geom_point(size = 2) +
facet_wrap(~ condition, ncol = 4) +
custom_theme() +
scale_color_manual(values = custom_range(5))
print(plot_sgRNA_ribo_example)
save_plot(plot_sgRNA_ribo_example, width = 7, height = 5.5)
A correlation score can be calculated by computing the correlation coefficient of all sgRNAs to each other. This score is robustly summarized by taking the median, and rescaling it from the respective minima and maxima [-1, 1] to [0, 1]. This score serves as a weight component for each sgRNA to calculate the (global) weighted mean of log2 FC over all sgRNAs. The score has the characteristic that it gives a weight of 1 for an sgRNA perfectly correlated with all other sgRNAs of the same gene, and a weight of 0 for sgRNAs perfectly anti-correlated to the other sgRNAs.
For a matrix of \(x = 1 .. m\) sgRNAs and \(y = 1 .. n\) observations (measurements), the correlation \(R\) of one sgRNA to another is calculated using Pearson’s method:
\(R_x=cor([log_2FC_{x1,y1} ... log_2FC_{x1,yn}], [log_2FC_{x2,y1} ... log_2FC_{x2,yn}])\)
The correlation weight of one sgRNA is then calculated as median of all \(R\) rescaled between 0 and 1.
\(w_x = \frac{1 + median(R_1, R_2, ..., R_m)}{2}\)
The following example shows the correlation matrix for the 5 rps10 sgRNAs, and their weights. The self correlation of each sgRNA (R = 1) is removed prior to weight determination.
cor_matrix <- df_main %>% filter(sgRNA_target == "rps10") %>% ungroup %>%
select(sgRNA_index, log2FoldChange, condition, time) %>%
pivot_wider(names_from = c("condition", "time"), values_from = log2FoldChange) %>%
arrange(sgRNA_index) %>% column_to_rownames("sgRNA_index") %>%
as.matrix %>% t %>% cor(method = "pearson")
weights <- cor_matrix %>% replace(., . == 1, NA) %>%
apply(2, function(x) median(x, na.rm = TRUE)) %>%
rescale(from = c(-1, 1), to = c(0, 1))
# plot heatmap
lattice::levelplot(cor_matrix %>% replace(., . == 1, NA),
col.regions = custom_range(20))
# print weights
weights
1 2 3 4 5
0.8440521 0.7864564 0.4605635 0.8265134 0.7689177
Now we can create a function that will compute weights for all sgRNAs, and add the weights to the data set.
determine_corr <- function(index, value, condition, time) {
# make correlation matrix
df <- data.frame(index = index, value = value, condition = condition, time = time)
cor_matrix <- pivot_wider(df, names_from = c("condition", "time"), values_from = value) %>%
arrange(index) %>% column_to_rownames("index") %>%
as.matrix %>% t %>% cor(method = "pearson")
# determine weights
weights <- cor_matrix %>% replace(., . == 1, NA) %>%
apply(2, function(x) median(x, na.rm = TRUE)) %>%
scales::rescale(from = c(-1, 1), to = c(0, 1)) %>%
enframe("index", "weight") %>% mutate(index = as.numeric(index)) %>%
mutate(weight = replace(weight, is.na(weight), 1))
# return vector of weights the same order and length
# as sgRNA index vector
left_join(df, weights, by = "index") %>% pull(weight)
}
df_main <- df_main %>%
group_by(sgRNA_target) %>%
mutate(sgRNA_correlation = determine_corr(sgRNA_index,
log2FoldChange, condition, time))
The correlation of each sgRNA with each other is a “global” parameter as it is identical over all conditions. A second global parameter, sgRNA efficiency, can be obtained using a similar approach. We expect that fitness of all sgRNAs for one gene is not normally distributed because sgRNAs are not ideal replicate measurements. They are biased by position effects and off-target binding, see Wang et al., Nature Comms, 2018 for a very insightful and comprehensive analysis of the number and position of sgRNAs required to estimate gene fitness.
We calculate sgRNA efficiency \(E\) as the median absolute fitness (AUC of log2FC over time) of an sgRNA \(x = 1 .. m\) over all observations [conditions] \(y = 1 .. n\).
\(E_x=median(abs(fitness_{x1, y1}, fitness_{x1, y2}, ..., fitness_{x1, yn}))\)
To normalize between all sgRNAs, \(E\) is rescaled to a range between 0 and 1.
\(E_x=\frac{E_x}{max(E_1, E_2, ..., E_m)}\)
df_main <- df_main %>% group_by(sgRNA_target) %>%
mutate(sgRNA_efficiency = ave(fitness, sgRNA_index, FUN = function(x) median(abs(x))) %>%
{./max(.)})
This is the resulting sgRNA efficiency for the example gene above, rps10.
df_main %>% filter(sgRNA_target == "rps10") %>% ungroup %>%
select(sgRNA_index, sgRNA_efficiency) %>% distinct %>%
arrange(sgRNA_index) %>% deframe
1 2 3 4 5
1.0000000 0.1519365 0.0351794 0.2105323 0.5110918
Plot the weight of each sgRNA to see if there is a dependency between correlation and sgRNA position. There is no significant trend.
We can also quantify how many genes have strongly correlated sgRNAs and how many have outliers. In order to do this, the median weight of the (up to) 5 sgRNAs per gene is plotted. Generally, the median weight ranges between 0.5 and 1.0, showing on average good correlation.
plot_sgRNA_correlation <- df_main %>%
select(sgRNA_target, sgRNA_index, sgRNA_correlation) %>%
filter(sgRNA_index <= 5) %>%
distinct %>%
# plot
ggplot(aes(x = factor(sgRNA_index), y = sgRNA_correlation)) +
geom_boxplot(outlier.shape = "") +
labs(x = "sgRNA position", y = "correlation") +
stat_summary(fun.data = function(x) c(y = median(x)+0.07,
label = round(median(x), 2)), geom = "text", size = 3) +
stat_summary(fun.data = function(x) c(y = 1.1,
label = length(x)), geom = "text", color = grey(0.5), size = 3) +
coord_cartesian(ylim = c(-0.15, 1.15)) +
custom_theme()
plot_sgRNA_correlation_hist <- df_main %>%
select(sgRNA_target, sgRNA_index, sgRNA_correlation) %>%
filter(sgRNA_index <= 5) %>%
distinct %>% group_by(sgRNA_target) %>%
summarize(
median_sgRNA_correlation = median(sgRNA_correlation),
min_sgRNA_correlation = min(sgRNA_correlation)
) %>%
# plot
ggplot(aes(x = median_sgRNA_correlation)) +
geom_histogram(bins = 40, fill = custom_colors[1], alpha = 0.7) +
custom_theme()
save_plot(plot_sgRNA_correlation_hist, width = 5, height = 4)
save_plot(plot_sgRNA_correlation, width = 5, height = 4)
ggarrange(plot_sgRNA_correlation, plot_sgRNA_correlation_hist, ncol = 2)
Second, the binding position of the sgRNAs could be correlated to the strength of repression. In other words sgRNAs binding closer to the promoter could have stronger ability to repress a gene, see Figure 1 B in Wang et al., Nature Comms, 2018. We plot sgRNA efficiency for genes only, because the absolute majority of those has 5 sgRNAs.
plot_sgRNA_efficiency <- df_main %>%
filter(sgRNA_index <= 5, sgRNA_type == "gene") %>%
select(sgRNA_target, sgRNA_index, sgRNA_efficiency) %>% distinct %>%
ggplot(aes(x = factor(sgRNA_index), y = sgRNA_efficiency)) +
geom_boxplot(notch = FALSE, outlier.shape = ".") +
labs(x = "sgRNA position (relative)", y = "repression efficiency") +
coord_cartesian(ylim = c(-0.15, 1.15)) +
stat_summary(fun.data = function(x) c(y = median(x)+0.07,
label = round(median(x), 2)), geom = "text", size = 3) +
stat_summary(fun.data = function(x) c(y = 1.1,
label = length(x)), geom = "text", color = grey(0.5), size = 3) +
custom_theme()
plot_sgRNA_efficiency_hist <- df_main %>%
filter(sgRNA_index <= 5, sgRNA_type == "gene") %>%
select(sgRNA_target, sgRNA_position, sgRNA_efficiency) %>% distinct %>%
group_by(sgRNA_position) %>%
summarize(sgRNA_efficiency = median(sgRNA_efficiency), n_pos = n()) %>%
filter(n_pos >= 10) %>%
ggplot(aes(x = sgRNA_position, y = sgRNA_efficiency)) +
labs(x = "sgRNA position (nt)", y = "repression efficiency") +
geom_point(col = alpha(custom_colors[5], 0.5)) +
geom_smooth() +
custom_theme()
save_plot(plot_sgRNA_efficiency, width = 5, height = 4)
save_plot(plot_sgRNA_efficiency_hist, width = 5, height = 4)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggarrange(plot_sgRNA_efficiency, plot_sgRNA_efficiency_hist, ncol = 2)
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
Export draft Figure 1 for manuscript.
plot_selected_sgRNAs <- df_main %>%
filter(
grepl("ctrl[1-5]$|rps10$", sgRNA_target),
condition %in% c("HC, HL", "HC, LL", "LC, IL", "LC, LL")) %>%
mutate(
sgRNA_index2 = as.numeric(str_extract(sgRNA_target, "[1-9]$")),
sgRNA_index = case_when(sgRNA_position == 0 ~ sgRNA_index2, TRUE ~ sgRNA_index),
sgRNA_target = str_extract(sgRNA_target, "[a-zA-Z]*")
) %>%
ggplot(aes(x = time, y = log2FoldChange, color = factor(sgRNA_index))) +
geom_line(size = 1) + geom_point(size = 2) +
facet_grid(sgRNA_target ~ condition) +
custom_theme(legend.position = 0) +
coord_cartesian(ylim = c(-4.5, 2.5)) +
scale_color_manual(values = custom_range(5))
svg(filename = "../figures/figure1.svg", width = 7, height = 5.5)
ggarrange(ncol = 2, nrow = 2, widths = c(0.6, 0.4), labels = LETTERS[1:4], font.label = list_fontpars,
plot_sgRNAs_per_gene + theme(plot.margin = unit(c(12,12,12,12), "points")),
plot_sgRNA_efficiency + theme(plot.margin = unit(c(26,12,12,12), "points")),
plot_selected_sgRNAs + theme(plot.margin = unit(c(12,-4,12,14), "points")),
plot_sgRNA_correlation + theme(plot.margin = unit(c(26,12,12,12), "points"))
)
dev.off()
null device
1
Export supplemental figure with all ribosomal genes (rpsNN/rplNN).
plot_sgRNAs_ribosome <- df_main %>%
filter(str_detect(sgRNA_target, "rp[sl][0-9]*$")) %>%
filter(condition == "LC, LL") %>%
ggplot(aes(x = time, y = log2FoldChange, color = factor(sgRNA_index))) +
geom_line(size = 1) + geom_point(size = 2) +
facet_wrap(~ sgRNA_target, ncol = 7) +
custom_theme(legend.position = "top") +
scale_color_manual(values = custom_range(5))
print(plot_sgRNAs_ribosome)
With the correlation and the efficiency per sgRNA, we can compute the weighted mean of all sgRNAs. For comparison, we also test simple strategies such as the standard arithmetic mean and a top 1 and top 2 sgRNAs strategy. Metrics are calculated for log2 FC, and fitness.
df_controls <- df_main %>% ungroup %>%
filter(str_detect(sgRNA_target, "ctrl[0-9]+$"))
df_gene <- df_main %>%
# keep all annotation columns
group_by(sgRNA_target, sgRNA_type, locus, gene_name, condition,
carbon, light, treatment, time) %>%
# summarize FC and fitness...
summarize(.groups = "drop",
# log2 FC
mean_log2FoldChange = mean(log2FoldChange),
wmean_log2FoldChange = weighted.mean(log2FoldChange, sgRNA_correlation * sgRNA_efficiency),
top1_log2FoldChange = log2FoldChange[which.max(sgRNA_efficiency)],
top2_log2FoldChange = mean(log2FoldChange[order(sgRNA_efficiency, decreasing = TRUE)[1:2]]),
sd_log2FoldChange = sd(log2FoldChange),
# fitness
mean_fitness = mean(fitness),
wmean_fitness = weighted.mean(fitness, sgRNA_correlation * sgRNA_efficiency),
top1_fitness = fitness[which.max(sgRNA_efficiency)],
top2_fitness = mean(fitness[order(sgRNA_efficiency, decreasing = TRUE)[1:2]]),
sd_fitness = sd(fitness),
# apply significance test, Mann-Whitney U test
p_value = wilcox.test(fitness, filter(df_controls, condition == unique(condition))$fitness)$p.value
)
Since statistical significance is tested for many genes in parallel, the p-value obtained from MWU test should be multiple-hypothesis corrected. For this purpose we use the Benjamini-Hochberg method. We also calculate a score taking both effect size and p-value into account, according to the publication from Wang et al., Nat Comm, 2018. This score is simply the absolute fitness score multiplied by the negative log10 p-value.
df_gene <- df_gene %>%
group_by(condition, time) %>%
mutate(
p_value_adj = p.adjust(p_value, method = "BH"),
score = abs(wmean_fitness)*-log10(p_value_adj)
) %>% ungroup
A comparison of log2 FC aggregated by the different method shows clear differences. For the example gene rps10 the weighted mean and the top method give similar results, representative of the stronger influence from highly depleted sgRNA repression strains. The regular mean is robust, but “shallow”, probably underestimating the real effect n fitness. The top 1 method simply picks the most depleted/enriched sgRNA (over all conditions) as representative.
df_gene %>% filter(sgRNA_target == "rps10") %>%
pivot_longer(cols = matches("[n12]_log2FoldChange"),
names_to = "metric", values_to = "log2FoldChange") %>%
mutate(metric = str_remove(metric, "_log2FoldChange")) %>%
ggplot(aes(x = time, y = log2FoldChange,
ymin = log2FoldChange-sd_log2FoldChange,
ymax = log2FoldChange+sd_log2FoldChange, color = fct_inorder(metric))) +
geom_line(size = 1) + geom_point(size = 2) + geom_linerange(size = 1) +
facet_wrap(~ condition, ncol = 4) +
custom_theme(legend.position = "right") +
coord_cartesian(ylim = c(-3.75, 0.75)) +
scale_color_manual(values = custom_range(4))
This plot shows a comparison of the 4 methods for the first 36 genes by alphabetical order, for one selected condition only (1% CO2, BG11, 1,000 µmol photons m-1 s-1). Here we can see that the top1 method is often but not always representative for the gene: For apcD or apcF, it does not seem representative compared to the mean, weighted mean, and top2 methods.
df_gene %>% filter(
gene_name %in% unique(.data[["gene_name"]])[1:36],
condition == "HC, HL"
) %>%
pivot_longer(cols = matches("[n12]_log2FoldChange"), names_to = "metric", values_to = "log2FoldChange") %>%
mutate(metric = str_remove(metric, "_log2FoldChange")) %>%
ggplot(aes(x = time, y = log2FoldChange,
ymin = log2FoldChange-sd_log2FoldChange,
ymax = log2FoldChange+sd_log2FoldChange, color = fct_inorder(metric))) +
geom_line(size = 1) + geom_point(size = 2) + geom_linerange(size = 1) +
facet_wrap(~ sgRNA_target, ncol = 7) +
custom_theme(legend.position = "top") +
coord_cartesian(ylim = c(-5, 5)) +
scale_color_manual(values = custom_range(4))
Global distribution of weighted mean fitness for all genes. Effect of ncRNA repression seems to be much lower than effect of gene repression.
plot_all_fitness_hist <- df_gene %>% filter(time == 0) %>%
ggplot(aes(x = wmean_fitness, fill = sgRNA_type)) +
geom_histogram(bins = 100) +
coord_cartesian(xlim = c(-4, 4), ylim = c(0, 1000)) +
facet_wrap( ~ condition, ncol = 4) +
custom_theme() +
scale_fill_manual(values = custom_colors[c(3:4)])
print(plot_all_fitness_hist)
save_plot(plot_all_fitness_hist, width = 7, height = 5)
plot_all_fitness_volc <- df_gene %>% filter(time == 0,
condition %in% c("HC, HL", "LC, LL")) %>%
arrange(sgRNA_type) %>%
ggplot(aes(x = wmean_fitness, y = -log10(p_value_adj), col = sgRNA_type)) +
geom_point(alpha = 0.5, size = 0.5) +
geom_line(data = data.frame(x = c(seq(-6, -0.5, 0.1), seq(0.5, 6, 0.1)),
y = 4/c(seq(6, 0.5, -0.1), seq(0.5, 6, 0.1))),
aes(x = x, y = y, shape = NULL, col = NULL), lty = 2) +
coord_cartesian(xlim = c(-7, 7), ylim = c(0, 4)) +
custom_theme(aspect = 1, legend.position = "left", legend.key.size = unit(0.4, "cm")) +
facet_wrap(~ condition) +
labs(x = "fitness", y = expression("-log"[10]*" p-value")) +
scale_color_manual(values = custom_colors[3:4]) +
scale_shape_manual(values=c(1, 19))
print(plot_all_fitness_volc)
save_plot(plot_all_fitness_volc, width = 6, height = 3)
Ten sgRNAs were included in the library that have no gene-specific targets. The following plot shows that these negative controls do not have an effect on strain fitness, except probably 2 sgRNAs in one specific condition.
plot_controls_sgRNAs <- df_main %>% filter(grepl("ctrl", sgRNA_target)) %>%
ggplot(aes(x = time, y = log2FoldChange, color = sgRNA_target)) +
geom_line(size = 1) + geom_point(size = 2) + ylim(-5, 5) +
facet_wrap(~ condition, ncol = 4) +
custom_theme() +
scale_color_manual(values = custom_range(10))
print(plot_controls_sgRNAs)
save_plot(plot_controls_sgRNAs, width = 7, height = 5.5)
To plot gene fitness for the enzymes of central carbon metabolism, we need a complete list of enzymes and the genes that they are mapped to. To list the different KEGG databases that can be queried, use listDatabases(). Gene-pathway mappings are obtained and merged with pathway names and gene/enzyme names.
# get mapping of pathways for each gene
df_kegg <- keggLink("pathway", "syn") %>%
enframe(name = "locus", value = "kegg_pathway_id") %>%
# get list of pathways with name/ID pairs
left_join(by = "kegg_pathway_id",
keggList("pathway", "syn") %>%
enframe(name = "kegg_pathway_id", value = "kegg_pathway")
) %>%
# get list of gene/enzyme names
left_join(by = "locus",
keggList("syn") %>%
enframe(name = "locus", value = "kegg_gene") %>%
mutate(kegg_gene_short = str_extract(kegg_gene, "^[a-zA-Z0-9]*;") %>%
str_remove(";"))
) %>%
# trim useless prefixes
mutate(
locus = str_remove(locus, "syn:"),
kegg_pathway_id = str_remove(kegg_pathway_id, "path:"),
kegg_pathway = str_remove(kegg_pathway, " - Synechocystis sp. PCC 6803")
)
head(df_kegg)
Sometimes even small effects in fitness can be relevant if several genes of the same pathway (or iso-enzymes) are affected. A simple fitness threshold will not reveal those changes. In such cases a more nuanced approach can be taken, a gene set enrichment analysis (GSEA). Several packages exist to test if functionally related genes are enriched, depleted, or both at the same time / the same conditions.
Before we test for enrichment of associated pathways/GO terms, we can have a look at the general depletion/enrichment per KEGG pathway. The fitness distribution per pathway can be visualized using a violin- or scatter plot.
plot_median_fitness_kegg <- df_gene %>% filter(time == 0) %>%
inner_join(df_kegg, by = "locus") %>%
group_by(kegg_pathway, condition) %>%
summarize(.groups = "drop",
fitness = median(wmean_fitness),
n_genes = n()
) %>% filter(n_genes >= 20) %>%
mutate(kegg_pathway = paste0(str_sub(kegg_pathway, 1, 25), "..")) %>%
mutate(kegg_pathway = fct_reorder(kegg_pathway, fitness, .desc = TRUE)) %>%
ggplot(aes(x = fitness, y = kegg_pathway)) +
geom_boxplot(outlier.shape = NULL, color = grey(0.5), fill = grey(0.9)) +
geom_point(aes(color = condition)) +
geom_vline(xintercept = 0, lty = 2, color = grey(0.5)) +
labs(x = "median fitness", y = "") +
custom_theme(legend.position = c(0.25, 0.25), legend.key.size = unit(0.4, "cm")) +
scale_fill_manual(values = custom_range(11)) +
scale_color_manual(values = custom_range(11))
print(plot_median_fitness_kegg)
Export draft Figure 2 for manuscript. We add photosystem I and II genes as examples for differential depletion. A heatmap.
plot_sgRNAs_ps1 <- df_gene %>%
filter(str_detect(sgRNA_target, "psa[A-Z]*"), time == 0) %>%
mutate(wmean_fitness = wmean_fitness %>% replace(., . > 4, 4) %>% replace(., . < -4, -4)) %>%
ggplot(aes(x = condition, y = fct_rev(sgRNA_target), fill = wmean_fitness)) +
geom_tile() + custom_theme() +
labs(title = "Photosystem I", x = "", y = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_gradientn(colours = c(custom_colors[1], grey(0.9), custom_colors[2]),
limits = c(-4, 4))
plot_sgRNAs_ps2 <- df_gene %>%
filter(str_detect(sgRNA_target, "psb[A-Z]*"), time == 0) %>%
mutate(wmean_fitness = wmean_fitness %>% replace(., . > 4, 4) %>% replace(., . < -4, -4)) %>%
mutate(sgRNA_target = str_replace(sgRNA_target, "psb13", "psbW")) %>%
ggplot(aes(x = condition, y = fct_rev(sgRNA_target), fill = wmean_fitness)) +
geom_tile() + custom_theme() +
labs(title = "Photosystem II", x = "", y = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_gradientn(colours = c(custom_colors[1], grey(0.9), custom_colors[2]),
limits = c(-4, 4))
ggarrange(ncol = 2, plot_sgRNAs_ps1, plot_sgRNAs_ps2)
svg(filename = "../figures/figure2.svg", width = 8, height = 7)
ggarrange(ncol = 2, widths = c(0.65, 0.35),
ggarrange(nrow = 2, heights = c(0.34, 0.66), labels = LETTERS[1:2], font.label = list_fontpars,
plot_all_fitness_volc + theme(plot.margin = unit(c(14,-8,14,40), "points")),
plot_median_fitness_kegg + theme(plot.margin = unit(c(6,12,12,12), "points"))),
ggarrange(nrow = 2, heights = c(0.4, 0.6), labels = LETTERS[3:4], font.label = list_fontpars,
plot_sgRNAs_ps1 + theme(plot.margin = unit(c(12,0,-14,0), "points")),
plot_sgRNAs_ps2 + theme(plot.margin = unit(c(12,0,0,0), "points"))
)
)
dev.off()
null device
1
We use the functions kegga for KEGG enrichment analysis and goana for GO term enrichment from the limma package. Both functions test for over or under-representation of genes associated with certain pathways or GO terms. The functions don’t take the strength of differential fitness into account (DF; the depletion/enrichment over time).
df_kegg_enrichment <- lapply(unique(df_gene$condition), function(cond) {
df_gene %>% filter(
sgRNA_type == "gene", time == 0,
condition == cond) %>%
# filter for differential fitness (DF) genes
filter(!between(wmean_fitness, -2.0, 2.0), !is.na(locus)) %>%
# perform KEGG enrichment
pull(locus) %>% kegga(species.KEGG = "syn") %>%
mutate(condition = cond)
}) %>% bind_rows
head(df_kegg_enrichment)
Now we visualize the pathways that are most enriched for DF genes. It turns out that ribosomal proteins are extremely depleted and therefore score high on the negative log10 p-value for pathway enrichment.
df_kegg_enrichment %>%
rename(kegg_pathway = Pathway) %>%
group_by(kegg_pathway) %>% filter(N >= 20) %>%
select(kegg_pathway, condition, P.DE) %>%
mutate(log10_p_value = -log10(P.DE), .keep = "unused") %>%
mutate(kegg_pathway = paste0(str_sub(kegg_pathway, 1, 25), "..")) %>%
# make correlation plot
pivot_wider(names_from = condition, values_from = log10_p_value) %>%
column_to_rownames(var = "kegg_pathway") %>% as.matrix %>%
corrplot(is.corr = FALSE, tl.col = grey(0.5), tl.cex = 0.8,
col = colorRampPalette(custom_colors[c(1,5,2)])(10), col.lim = c(0, 20))
# generate color palette for heatmap
heat_cols <- colorspace::diverging_hcl(n = 7, h = c(255, 12), c = c(50, 80), l = c(20, 97), power = c(1, 1.3))
# create a matrix-like df with wide fitness data for plotting heatmap
df_heatmap <- df_gene %>%
filter(time == 0, !is.na(locus)) %>%
select(locus, condition, wmean_fitness) %>%
mutate(wmean_fitness = wmean_fitness %>% replace(., . > 8, 8) %>% replace(., . < -8, -8)) %>%
pivot_wider(names_from = condition, values_from = wmean_fitness) %>%
column_to_rownames(var = "locus")
# subset of df with *strongly changed* genes
df_heatmap2 <- df_heatmap %>%
filter(if_any(.cols = matches("[HL]C, "), ~ !between(., -4, 4)))
# create cluster for reordering
mat_cluster <- df_heatmap %>% as.matrix %>% dist %>% hclust(method = "ward.D2")
mat_heatmap <- df_heatmap %>% as.matrix %>%
.[order.dendrogram(as.dendrogram(mat_cluster)), ncol(.):1]
# repeat this step with subset of *significantly changed* genes
mat_cluster_sig <- df_heatmap2 %>% as.matrix %>% dist %>% hclust(method = "ward.D2")
mat_heatmap_sig <- df_heatmap2 %>% as.matrix %>%
.[order.dendrogram(as.dendrogram(mat_cluster_sig)), ncol(.):1]
Now we can plot all genes, a subset with only significant genes, and a dendrogram for clustering. The result is hard to interpret. With some exceptions, most genes are grouped in broad unspecific clusters that do not reveal clear relationships between treatment variables and fitness outcome.
plot_heatmap_all <- levelplot(mat_heatmap,
par.settings = custom.colorblind(),
col.regions = colorRampPalette(heat_cols)(20),
at = seq(-8, 8, 1), aspect = "fill",
xlab = paste0("genes (", nrow(mat_heatmap),")"),
ylab = "", scales = list(x = list(draw = FALSE)),
panel = function(x, y, z, ...) {
panel.levelplot(x, y, z, ...)
panel.abline(h = 1:5+0.5, col = "white", lwd = 1.5)
}
)
print(plot_heatmap_all)
save_plot(plot_heatmap_all, width = 8, height = 2.5)
plot_heatmap_sig <- levelplot(mat_heatmap_sig,
par.settings = custom.colorblind(),
col.regions = colorRampPalette(heat_cols)(20),
at = seq(-8, 8, 1), aspect = "fill",
xlab = paste0("genes (", nrow(mat_heatmap_sig),")"),
ylab = "", scales = list(x = list(draw = FALSE)),
panel = function(x, y, z, ...) {
panel.levelplot(x, y, z, ...)
panel.abline(h = 1:ncol(mat_heatmap_sig)+0.5, col = "white", lwd = 1.5)
}
)
plot_cluster_dend <- mat_cluster_sig %>% as.dendrogram %>%
set("branches_k_col", custom_colors[1:5], k = 5) %>%
set("branches_lwd", 0.5) %>%
as.ggdend %>%
ggplot(labels = FALSE)
gridExtra::grid.arrange(
# coords for unit: top, right, bottom, left
plot_cluster_dend +
theme(plot.margin = unit(c(0.1, 0.075, -0.26, 0.135),"npc")),
plot_heatmap_sig,
nrow = 2
)
We use two different dimensionality reduction methods, nMDS and t-SNE. We can check if these methods reproduce the clustering for the significantly regulated genes produced with hclust. Analysis shows that the small clusters are more strongly separated from the rest.
# set a seed to obtain same pattern for stochastic methods
set.seed(123)
# run nMDS analysis
NMDS <- df_heatmap2 %>% as.matrix %>% dist %>% metaMDS
Run 0 stress 0.08477877
Run 1 stress 0.08485663
... Procrustes: rmse 0.002551692 max resid 0.04275329
Run 2 stress 0.08484567
... Procrustes: rmse 0.001445249 max resid 0.01754732
Run 3 stress 0.09041578
Run 4 stress 0.09773472
Run 5 stress 0.09766743
Run 6 stress 0.09766712
Run 7 stress 0.09766695
Run 8 stress 0.097774
Run 9 stress 0.08581093
Run 10 stress 0.08477915
... Procrustes: rmse 0.0001249276 max resid 0.00204171
... Similar to previous best
Run 11 stress 0.09034496
Run 12 stress 0.08477945
... Procrustes: rmse 0.0001738428 max resid 0.002843804
... Similar to previous best
Run 13 stress 0.08477977
... Procrustes: rmse 0.0002175779 max resid 0.003564318
... Similar to previous best
Run 14 stress 0.08477903
... Procrustes: rmse 0.0001075038 max resid 0.001756538
... Similar to previous best
Run 15 stress 0.08484495
... Procrustes: rmse 0.001423391 max resid 0.01739739
Run 16 stress 0.08485667
... Procrustes: rmse 0.002551594 max resid 0.0427536
Run 17 stress 0.08485605
... Procrustes: rmse 0.002540429 max resid 0.04264689
Run 18 stress 0.08477984
... Procrustes: rmse 0.0002060804 max resid 0.003370265
... Similar to previous best
Run 19 stress 0.09041548
Run 20 stress 0.09286828
*** Solution reached
df_nmds <- NMDS$points %>% as_tibble(rownames = "locus") %>%
left_join(enframe(cutreeord(mat_cluster_sig, k = 5), "locus", "cluster"))
Joining, by = "locus"
# run t-SNE analysis
SNE <- tsne::tsne(df_heatmap2 %>% as.matrix %>% dist)
sigma summary: Min. : 0.51453154858122 |1st Qu. : 0.646910330520664 |Median : 0.714309515627356 |Mean : 0.746309912193634 |3rd Qu. : 0.79479345979067 |Max. : 1.58041817397143 |
Epoch: Iteration #100 error is: 14.170258824269
Epoch: Iteration #200 error is: 0.423942816192504
Epoch: Iteration #300 error is: 0.406398188540306
Epoch: Iteration #400 error is: 0.404698129239104
Epoch: Iteration #500 error is: 0.404337884848916
Epoch: Iteration #600 error is: 0.404205045213674
Epoch: Iteration #700 error is: 0.404146174344848
Epoch: Iteration #800 error is: 0.404117696326219
Epoch: Iteration #900 error is: 0.404102464318572
Epoch: Iteration #1000 error is: 0.404093843400558
df_tsne <- SNE %>% setNames(c("x", "y")) %>% as_tibble %>%
mutate(locus = rownames(df_heatmap2)) %>%
left_join(enframe(cutreeord(mat_cluster_sig, k = 5), "locus", "cluster"))
Joining, by = "locus"
plot_nmds <- df_nmds %>%
ggplot(aes(x = MDS1, y = MDS2, color = factor(cluster))) +
geom_point(size = 2) + labs(title = "nMDS") +
custom_theme(legend.position = c(0.85, 0.78)) +
scale_color_manual(values = custom_colors)
plot_tsne <- df_tsne %>%
ggplot(aes(x = V1, y = V2, color = factor(cluster))) +
geom_point(size = 2) + labs(title = "t-SNE") +
custom_theme(legend.position = c(0.85, 0.78)) +
scale_color_manual(values = custom_colors)
gridExtra::grid.arrange(plot_nmds, plot_tsne, ncol = 2)
ggsave("../figures/plot_nmds_tsne.svg",
plot = gridExtra::arrangeGrob(plot_nmds, plot_tsne, ncol = 2),
device = "svg", width = 8, height = 4)
We can find clusters of genes with similar fitness, but it is also important to identify why they cluster together. In order to find out which variables determine the fitness outcome of a gene, we can perform multiple linear regression. Each gene needs to have fitness outcomes annotated with the different (mixed) variables carbon, light, treatment. The latter can be subdivided in individual treatment columns glucose, DCMU, fluctuating light, and so on. Multiple linear regression fits a linear model of the following form to the data:
response ~ intercept + predictor A x slope A + predictor B x slope B x ...
Here, fitness is the response variable, the different conditions are the predictors. It is important to convert the categorical predictors into (numerical) dummy variables. Then for each individual gene, multiple linear models are fitted and the power of each predictor variable to predict the response is extracted.
# fixed model with 6 predictor variables -- dynamic layout would
# be better in future
fit_linreg <- function(y, x1, x2, x3, x4, x5, x6){
fit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6)
c(coefficients(fit), summary(fit)$coefficients[, 4],
summary(fit)$r.squared)
}
# recode categorical to numerical (dummy) variables
df_linreg <- df_gene %>%
filter(!is.na(locus)) %>%
select(locus, carbon, light, treatment, wmean_fitness) %>% distinct %>%
mutate(
carbon = recode(carbon, `HC` = 1, `LC` = 0),
light = recode(light, `LL` = 0, `IL` = 0.5, `HL` = 1)) %>%
mutate(dummy = 1, treatment = replace(treatment, treatment == "", "-")) %>%
pivot_wider(names_from = treatment, values_from = dummy, values_fill = 0) %>%
mutate(`+G` = `+G` + `+D, +G`) %>% rename(`+D` = `+D, +G`) %>% select(-`-`) %>%
# fit model
group_by(locus) %>%
summarize(coefficient = fit_linreg(wmean_fitness, carbon, light, `-N`, `+FL`, `+G`, `+D`),
.groups = "keep") %>% #unnest(coefficient) %>%
mutate(treatment = c(rep(c("intercept", "carbon", "light", "-N", "+FL", "+G", "+D"), 2) %>%
paste0(rep(c("", "pval_"), each = 7), .), "r_squared"))
Now we can overlay the information of the best predictor variable on the cluster map produced by tSNE, for example, and this way identify groups of genes regulated in a similar degree, by similar variables.
plot_tsne_linreg <- df_tsne %>%
inner_join(df_linreg, by = "locus") %>%
left_join(select(df_gene, locus, sgRNA_target) %>% distinct, by = "locus") %>%
filter(!str_detect(treatment, "intercept|pval|r_squared")) %>%
mutate(sgRNA_target = if_else(abs(coefficient) > 2, sgRNA_target, "")) %>%
mutate(point_size = abs(coefficient),
coefficient = coefficient %>% replace(., . > 5, 5) %>% replace(., . < -5, -5)) %>%
ggplot(aes(x = V1, y = V2, size = point_size,
color = coefficient, label = sgRNA_target)) +
geom_point() +
labs(title = "t-SNE clustering of DF genes",
subtitle = paste0("dot color/size encodes effect of variable, n = ", nrow(df_tsne))) +
custom_theme(aspect = 1) +
scale_color_gradientn(limits = c(-5, 5),
colours = c(custom_colors[1], grey(0.6, 0.8), custom_colors[2])) +
scale_size_continuous(range = c(0.5, 7)) +
geom_text_repel(size = 3, max.overlaps = 50) +
facet_wrap( ~ treatment, ncol = 2)
print(plot_tsne_linreg)
This strategy reveals a list of interesting condition-specific genes:
ssr3532 - unknown short protein, strongest known interaction in STRING with GlsA glutaminasesll1521 - Putative diflavin flavoprotein A3 (dfa3), negatively corr. with fitnesssll0217 Putative diflavin flavoprotein A2 (dfa2), positively corr. with fitnesssll0593 - glk, glucokinase, catalyzes P-ylation of Glc to G6Psll1533 - pilT, fimbria assembly, mobility, Glc transport or sensing?ssl3364 - unknown short protein, strongly interacts with RbcX, RbcR, Prk. Important for C-metabolism adapation?ssr2142 ycf19, short unknown protein, interacts with psbO and Tat membrane protein insertion system,slr0963 sir, sulfite reductase, ferredoxin H2O + HS + ferredoxin <-> H+ + reduced ferredoxin + sulfite, strongly interacts with other proteins in sulfur metabolism, specifically related to cofactor biosynthesis, cobalamin (vitamin B12) and sirohemesll0217 Putative diflavin flavoprotein A2 (dfa2), KO negatively correlated with fitness with C, positive with +FLsll0218 same behavior as dfa2, interacts with dfa2,4, contributes to PSII stabilization, Bersanini et al., 2017The table with linear regression coefficients and p-values is reshaped to long format for better readability. The kableExtra package is used to color cells for easier recognition. Then we subset the table for each treatment in order to spot the most interesting genes.
df_linreg_wide <- df_linreg %>%
pivot_wider(names_from = treatment, values_from = coefficient) %>%
left_join(select(df_gene, locus, sgRNA_target) %>% distinct, by = "locus") %>%
select(-matches("intercept")) %>%
filter(if_any(matches("^(carb|light|\\-|\\+)"), ~ abs(.) > 2)) %>%
mutate(across(matches("carb|light|\\-|\\+"), ~ round(., 3))) %>%
ungroup %>% select(sgRNA_target, locus, matches("."))
color_table <- function(df, variable) {
filter(df, abs(.data[[variable]]) > 2) %>%
select(matches("^(sg|loc|r_s|carb|light|\\-|\\+)") | all_of(paste0("pval_", variable))) %>%
arrange(desc(.data[[variable]])) %>%
mutate(across(3:8, ~ cell_spec(., "html", color = "white",
background = spec_color(., option = "E", scale = c(-5.5, 5.5)),
bold = TRUE))) %>%
kbl(format = "html", escape = F) %>%
kable_paper("striped", full_width = F)
}
df_linreg_wide %>% color_table("carbon")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_carbon |
|---|---|---|---|---|---|---|---|---|---|
| sll0364 | sll0364 | 2.834 | -2.812 | -0.108 | 1.758 | 0.601 | 3.578 | 0.8717770 | 0.024 |
| slr1095 | slr1095 | 2.429 | -2.1 | -1.284 | -1.592 | -0.274 | 1.777 | 0.6370579 | 0.083 |
| sll1734 | sll1734 | 2.426 | -1.267 | -0.478 | -1.625 | 0.837 | 1.183 | 0.6560893 | 0.099 |
| slr0211 | slr0211 | 2.354 | 0.469 | -0.044 | -0.362 | 0.786 | 1.884 | 0.8217569 | 0.022 |
| ftsZ | sll1633 | 2.34 | -2.314 | -1.677 | -1.439 | 0.318 | 2.136 | 0.9249829 | 0.006 |
| ndhD3 | sll1733 | 2.326 | -1.213 | -0.489 | -1.392 | 0.67 | 1.172 | 0.6640536 | 0.085 |
| ssr3532 | ssr3532 | 2.303 | -1.661 | -3.733 | -1.464 | 0.702 | 1.64 | 0.6071408 | 0.163 |
| ndhF2 | sll1732 | 2.163 | -0.874 | -0.602 | -1.431 | 1.022 | 1.124 | 0.6649520 | 0.103 |
| slr1818 | slr1818 | 2.159 | -1.144 | -1.369 | -1.55 | -0.087 | 1.675 | 0.5831706 | 0.116 |
| sll0488 | sll0488 | 2.151 | -1.322 | -1.418 | -1.434 | 0.047 | 1.624 | 0.6312515 | 0.090 |
| sll0481 | sll0481 | 2.141 | -3.371 | -1.601 | -1.335 | 0.485 | 1.056 | 0.9686290 | 0.002 |
| sll0995 | sll0995 | 2.101 | -0.278 | -1.933 | -1.322 | 0.722 | 1.384 | 0.6077491 | 0.132 |
| cmpB | slr0041 | 2.041 | -1.904 | -0.766 | -1.032 | 0.133 | 1.15 | 0.5577051 | 0.126 |
| sir | slr0963 | -2.034 | 4.169 | 1.41 | 1.489 | -0.345 | -0.303 | 0.8364473 | 0.048 |
| rpl24 | sll1807 | -2.081 | 0.994 | 0.573 | 1.814 | 0.023 | -1.225 | 0.4871941 | 0.200 |
| rps9 | sll1822 | -2.087 | 0.874 | 0.64 | 1.319 | -0.719 | -0.988 | 0.5755515 | 0.137 |
| slr6107 | slr6107 | -2.108 | 1.215 | 1.13 | 1.151 | -0.237 | -1.246 | 0.5493324 | 0.124 |
| slr0272 | slr0272 | -2.122 | 1.596 | 1.27 | 1.27 | -1.038 | -0.791 | 0.6159525 | 0.127 |
| leuD | sll1444 | -2.209 | 2.505 | -0.124 | 1.64 | -1.451 | -0.038 | 0.6714334 | 0.146 |
| rps17 | ssl3437 | -2.241 | 0.79 | 1.027 | 1.922 | -0.148 | -1.382 | 0.5117852 | 0.184 |
| slr0007 | slr0007 | -2.308 | 1.932 | 0.695 | 2.012 | -0.851 | -1.068 | 0.6792795 | 0.109 |
| slr1938 | slr1938 | -2.311 | 0.268 | 0.592 | 1.075 | -0.772 | -1.967 | 0.7489109 | 0.051 |
| rpl35 | ssl1426 | -2.361 | 1.851 | 1.677 | 1.293 | -0.482 | -1.844 | 0.5899106 | 0.124 |
| sll0218 | sll0218 | -2.376 | 1.766 | 0.727 | 1.766 | -0.193 | -1.1 | 0.6105884 | 0.110 |
| sll0217 | sll0217 | -2.418 | 1.723 | 0.954 | 2.228 | -0.286 | -1.152 | 0.6281035 | 0.118 |
| sll0933 | sll0933 | -2.634 | 1.271 | 0.081 | 2.115 | -0.633 | -0.702 | 0.6558952 | 0.105 |
| slr1245 | slr1245 | -2.694 | 1.826 | 1.245 | -0.805 | 1.092 | -3.224 | 0.8325233 | 0.019 |
| rps15 | ssl1784 | -2.769 | 1.647 | 1.312 | 2.437 | -0.758 | -1.499 | 0.7223751 | 0.075 |
df_linreg_wide %>% color_table("light")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_light |
|---|---|---|---|---|---|---|---|---|---|
| apcE | slr0335 | -0.681 | 5.071 | 0.314 | 1.093 | 3.962 | 1.65 | 0.8255851 | 0.055 |
| apcA | slr2067 | 0.027 | 4.441 | 0.576 | 0.809 | 3.301 | 2.922 | 0.7380667 | 0.127 |
| sll1878 | sll1878 | -1.585 | 4.291 | 2.016 | 2.165 | 1.301 | -0.213 | 0.7988744 | 0.024 |
| sir | slr0963 | -2.034 | 4.169 | 1.41 | 1.489 | -0.345 | -0.303 | 0.8364473 | 0.031 |
| cpcB | sll1577 | 0.12 | 4.032 | 0.141 | 0.74 | 2.544 | 2.476 | 0.8276584 | 0.053 |
| murC | slr1423 | 0.232 | 3.875 | 0.345 | 0.027 | 1.164 | -0.248 | 0.6419118 | 0.096 |
| sll1378 | sll1378 | -0.955 | 3.701 | -0.4 | 1.2 | 2.782 | 0.937 | 0.9545533 | 0.006 |
| sll1879 | sll1879 | -0.306 | 3.662 | -0.894 | -0.217 | 0.681 | -0.74 | 0.7312512 | 0.074 |
| hitB | slr0327 | -1.726 | 3.606 | 1.56 | 1.726 | 0.576 | -0.39 | 0.6687398 | 0.087 |
| cpcA | sll1578 | 0.11 | 3.57 | 0.261 | 0.609 | 2.255 | 2.162 | 0.8434275 | 0.044 |
| slr0947 | slr0947 | 0.18 | 3.422 | 0.454 | 0.742 | 1.52 | 0.613 | 0.7928550 | 0.028 |
| slr1990 | slr1990 | -0.768 | 3.219 | 0.197 | 0.994 | 2.472 | 2.429 | 0.8920252 | 0.044 |
| slr1102 | slr1102 | 0.056 | 3.162 | -0.118 | 0.511 | 2.737 | 0.819 | 0.8207709 | 0.065 |
| amiC | slr0447 | -1.235 | 3.159 | 0.549 | -0.175 | 1.309 | 0.402 | 0.9300166 | 0.007 |
| sll6055 | sll6055 | -1.122 | 3.123 | 0.558 | 0.514 | 2.406 | 2.128 | 0.8376624 | 0.091 |
| cyp2 | slr0574 | -0.6 | 3.073 | 0.867 | 1.027 | 2.347 | -0.438 | 0.9444943 | 0.003 |
| sll1945 | sll1945 | -0.194 | 3.072 | 0.642 | 0.625 | 0.063 | 1.299 | 0.7068087 | 0.059 |
| sll0689 | sll0689 | -0.125 | 3.062 | 0.19 | 0.41 | 0.027 | 0.162 | 0.5002592 | 0.178 |
| narB | sll1454 | -0.223 | 3.011 | 0.044 | 0.014 | 0.034 | 0.634 | 0.8309629 | 0.025 |
| def | slr1549 | 0.929 | 2.973 | 1.091 | -0.039 | 1.821 | 0.97 | 0.7394573 | 0.103 |
| slr7096 | slr7096 | -0.452 | 2.972 | -0.83 | 0.696 | -0.16 | 0.433 | 0.9130593 | 0.010 |
| psbJ | smr0008 | 0.192 | 2.968 | 0.293 | 0.419 | 3.216 | 3.222 | 0.8923081 | 0.090 |
| slr1505 | slr1505 | -0.682 | 2.926 | 0.456 | 0.864 | 2.222 | 3.068 | 0.9263147 | 0.032 |
| slr0483 | slr0483 | 0.315 | 2.922 | 0.533 | 0.633 | 1.047 | 0.53 | 0.9363725 | 0.003 |
| nirA | slr0898 | -1.083 | 2.919 | -0.106 | 0.515 | -0.343 | -0.004 | 0.7628800 | 0.063 |
| apcB | slr1986 | 0.627 | 2.876 | 0.319 | 0.432 | 2.241 | 2.188 | 0.7619664 | 0.129 |
| slr0734 | slr0734 | 0.52 | 2.819 | 0.607 | 0.753 | 1.99 | 2.183 | 0.8211958 | 0.067 |
| slr1170 | slr1170 | -0.807 | 2.766 | -0.676 | 0.308 | 0.668 | -0.598 | 0.7363329 | 0.070 |
| ycf38 | sll0760 | -0.445 | 2.748 | -0.294 | 0.164 | 0.823 | 1.469 | 0.6739376 | 0.122 |
| ssl0331 | ssl0331 | -1.177 | 2.738 | 1.195 | 1.389 | 0.463 | -1.187 | 0.8598838 | 0.019 |
| slr1693 | slr1693 | -1.723 | 2.719 | 0.576 | 0.898 | -0.397 | -1.131 | 0.8525814 | 0.045 |
| psbO | sll0427 | 0.64 | 2.707 | 0.222 | 0.633 | 3.339 | 1.65 | 0.8838887 | 0.078 |
| psbD | sll0849 | -0.861 | 2.664 | 1.002 | -0.005 | 3.231 | 3.101 | 0.8557861 | 0.192 |
| slr1692 | slr1692 | 0.437 | 2.655 | 0.33 | 0.811 | 1.705 | 1.623 | 0.8429081 | 0.039 |
| hemB | sll1994 | -0.063 | 2.652 | -0.914 | 0.697 | 0.646 | 0.836 | 0.6646820 | 0.110 |
| slr2042 | slr2042 | -0.364 | 2.633 | 0.809 | 1.148 | 1.345 | -1.302 | 0.8793763 | 0.008 |
| cpcG | slr2051 | -0.205 | 2.621 | 0.183 | 0.472 | 2.294 | 1.485 | 0.9132484 | 0.027 |
| slr1302 | slr1302 | -0.999 | 2.591 | 0.061 | 0.454 | 1.365 | -0.191 | 0.8534901 | 0.026 |
| moeB | sll1536 | -1.04 | 2.579 | 1.069 | 0.234 | -0.366 | -0.368 | 0.7776446 | 0.054 |
| sll0148 | sll0148 | 0.357 | 2.556 | -0.37 | -0.143 | 1.982 | 0.077 | 0.7719391 | 0.094 |
| plsX | slr1510 | 0.976 | 2.549 | 0.169 | -0.006 | 0.186 | 1.868 | 0.6005493 | 0.205 |
| sll6109 | sll6109 | -0.938 | 2.516 | 0.586 | 0.717 | 0.079 | 0.159 | 0.8704727 | 0.013 |
| cysH | slr1791 | -1.968 | 2.512 | 0.674 | 1.242 | 0.068 | -0.998 | 0.8428947 | 0.057 |
| leuD | sll1444 | -2.209 | 2.505 | -0.124 | 1.64 | -1.451 | -0.038 | 0.6714334 | 0.313 |
| trxA3 | slr0623 | -1.336 | 2.493 | 1.078 | 1.881 | 0.16 | -0.637 | 0.8097690 | 0.057 |
| nrtB | sll1451 | -0.107 | 2.485 | 0.272 | 0.388 | -0.244 | 1.229 | 0.8884861 | 0.012 |
| slr1841 | slr1841 | -0.166 | 2.451 | 1.171 | 0.655 | 0.131 | 0.426 | 0.7466448 | 0.040 |
| sll2003 | sll2003 | -0.423 | 2.396 | 0.519 | 0.236 | 1.625 | 0.597 | 0.8779608 | 0.022 |
| drgA | slr1719 | -1.649 | 2.368 | 0.596 | 1.457 | -0.317 | -0.446 | 0.6695048 | 0.187 |
| sll1380 | sll1380 | -0.367 | 2.364 | 0.551 | 0.741 | 1.707 | 0.674 | 0.9376518 | 0.006 |
| petH | slr1643 | -1.23 | 2.351 | 1.048 | 1.592 | 2.039 | -2.97 | 0.7531722 | 0.109 |
| sll1500 | sll1500 | -1.256 | 2.346 | -0.004 | 0.248 | -0.861 | -0.634 | 0.7141056 | 0.150 |
| sll1304 | sll1304 | -0.655 | 2.334 | 0.667 | 0.558 | 0.589 | 0.622 | 0.8743538 | 0.010 |
| sll0301 | sll0301 | 0.045 | 2.33 | -0.396 | -0.094 | 2.231 | 0.605 | 0.9108020 | 0.034 |
| slr0950 | slr0950 | -0.06 | 2.276 | 0.509 | 0.314 | 1.55 | 0.089 | 0.8557459 | 0.020 |
| hemC | slr1887 | -0.702 | 2.265 | 0.206 | 0.998 | 0.118 | 0.461 | 0.6255683 | 0.111 |
| trpF | sll0356 | 0.03 | 2.257 | 0.118 | 0.505 | -0.803 | 1.243 | 0.7586823 | 0.082 |
| ndhF | slr2009 | 0.221 | 2.237 | 0.905 | 0.613 | 2.258 | 3.08 | 0.9355070 | 0.043 |
| slr1042 | slr1042 | 0.571 | 2.208 | 0.633 | 0.498 | 0.785 | 0.821 | 0.5506013 | 0.163 |
| proC | slr0661 | -0.746 | 2.205 | 0.115 | -0.077 | 1.399 | 1.48 | 0.8763769 | 0.057 |
| rpiA | slr0194 | -0.278 | 2.194 | 1.674 | 1.505 | 2.476 | -1.452 | 0.9605876 | 0.003 |
| thrA | sll0455 | 0.548 | 2.169 | -0.427 | -0.459 | 0.803 | 0.359 | 0.6473974 | 0.170 |
| glnA | slr1756 | -0.699 | 2.114 | 0.035 | 0.284 | -1.379 | 0.004 | 0.8295902 | 0.089 |
| slr0519 | slr0519 | -1.506 | 2.09 | 1.043 | 1.988 | 0.287 | -1.1 | 0.6946988 | 0.190 |
| ssl1918 | ssl1918 | -0.063 | 2.086 | 0.326 | 0.449 | 1.257 | 0.418 | 0.7483282 | 0.054 |
| sll0847 | sll0847 | -0.875 | 2.075 | -0.013 | 0.095 | 0.622 | 0.374 | 0.7363211 | 0.083 |
| ccmK4 | slr1839 | -0.233 | 2.073 | 0.584 | 0.413 | 2.248 | 0.326 | 0.7885394 | 0.106 |
| pdhB | sll1721 | -0.862 | 2.072 | 1.323 | 1.246 | 0.616 | -0.273 | 0.7513861 | 0.044 |
| sll1025 | sll1025 | 0.419 | 2.061 | 0.377 | 0.473 | 1.319 | 0.495 | 0.7599508 | 0.059 |
| prc | slr1751 | 0.376 | 2.057 | 1.334 | 0.611 | 0.429 | 0.649 | 0.5909157 | 0.136 |
| slr0771 | slr0771 | -0.084 | 2.042 | 0.311 | 0.342 | 0.239 | -0.095 | 0.7574635 | 0.037 |
| tufA | sll1099 | -1.461 | 2.041 | 1.399 | 2.218 | 0.036 | -1.078 | 0.7057584 | 0.217 |
| fabF | sll1069 | 0.361 | 2.04 | 1.575 | 0.069 | 0.345 | -0.025 | 0.8170498 | 0.048 |
| aroB | slr2130 | -1.206 | 2.02 | -0.656 | 0.772 | -0.523 | -0.045 | 0.8102623 | 0.094 |
| nrtA | sll1450 | -0.214 | 2.019 | 0.246 | 0.238 | -0.409 | 1.571 | 0.8602072 | 0.026 |
| hhoB | sll1427 | -0.04 | 2.017 | -0.198 | 0.237 | 2.207 | 0.158 | 0.7804217 | 0.129 |
| slr0813 | slr0813 | 1.839 | -2.009 | 0.304 | 0.354 | -0.981 | 0.595 | 0.7628332 | 0.187 |
| slr1783 | slr1783 | -0.973 | -2.014 | -0.805 | -0.173 | -0.65 | 0.265 | 0.8726587 | 0.038 |
| sll0176 | sll0176 | -0.408 | -2.085 | -2.021 | -1.717 | -1.526 | 0.952 | 0.6551709 | 0.151 |
| slr1095 | slr1095 | 2.429 | -2.1 | -1.284 | -1.592 | -0.274 | 1.777 | 0.6370579 | 0.325 |
| clpP4 | sll0534 | -0.193 | -2.123 | -1.24 | -0.652 | -0.922 | -0.194 | 0.7367912 | 0.042 |
| sll5063 | sll5063 | -1.163 | -2.128 | -0.35 | 0.218 | -0.272 | 0.018 | 0.8440277 | 0.070 |
| mraY | sll0657 | 1.211 | -2.188 | -0.758 | -0.99 | -0.843 | 0.787 | 0.8718734 | 0.017 |
| sll0162 | sll0162 | 0.229 | -2.203 | -0.403 | -0.234 | -0.405 | 0.062 | 0.6347022 | 0.078 |
| fabF2 | slr1332 | 0.401 | -2.251 | 0.251 | 0.187 | -2.097 | 0.38 | 0.9209766 | 0.021 |
| slr1098 | slr1098 | 0.903 | -2.261 | -0.199 | -0.434 | 0.479 | -2.543 | 0.4195152 | 0.403 |
| sll1757 | sll1757 | 0.783 | -2.271 | -0.494 | -0.752 | -1.562 | -0.125 | 0.5383796 | 0.200 |
| ftsZ | sll1633 | 2.34 | -2.314 | -1.677 | -1.439 | 0.318 | 2.136 | 0.9249829 | 0.039 |
| gcvP | slr0293 | 1.861 | -2.335 | -1.541 | -0.695 | -0.558 | 1.696 | 0.9451148 | 0.007 |
| slr0484 | slr0484 | 0.087 | -2.342 | -0.79 | 0.038 | -0.367 | -0.061 | 0.7395823 | 0.049 |
| ssr0657 | ssr0657 | 0.935 | -2.358 | -0.659 | -0.992 | -2.021 | 1.496 | 0.9170055 | 0.009 |
| sll1915 | sll1915 | 0.381 | -2.385 | -0.147 | -0.431 | -1.437 | -0.118 | 0.6397709 | 0.110 |
| dnaJ4 | sll0897 | 0.955 | -2.471 | -0.087 | 0.165 | -0.67 | 1.728 | 0.9304110 | 0.007 |
| psaK | ssr0390 | 1.17 | -2.479 | -1.404 | -0.784 | -0.729 | 0.717 | 0.6971797 | 0.062 |
| ppiB | sll0227 | 0.735 | -2.528 | -1.035 | -0.173 | -0.759 | 0.306 | 0.8520480 | 0.013 |
| slr0643 | slr0643 | 0.127 | -2.545 | -0.449 | 0.098 | -0.647 | 0.279 | 0.5529642 | 0.140 |
| sll0498 | sll0498 | 0.407 | -2.577 | -1.166 | -1.108 | -2.148 | -1.967 | 0.8482665 | 0.063 |
| mrdB | slr1267 | 0.179 | -2.655 | -0.252 | -0.24 | -1.365 | 0.704 | 0.8603778 | 0.014 |
| rpoE | slr1545 | 0.874 | -2.759 | -0.425 | 0.6 | -1.246 | 1.277 | 0.5615657 | 0.194 |
| sll0364 | sll0364 | 2.834 | -2.812 | -0.108 | 1.758 | 0.601 | 3.578 | 0.8717770 | 0.119 |
| clpX | sll0535 | 0.736 | -2.966 | -1.054 | -1.121 | -1.503 | 0.01 | 0.7464408 | 0.035 |
| sll0481 | sll0481 | 2.141 | -3.371 | -1.601 | -1.335 | 0.485 | 1.056 | 0.9686290 | 0.002 |
| sll0877 | sll0877 | 1.76 | -3.694 | -2 | -1.39 | -1.273 | 0.847 | 0.7839636 | 0.032 |
| ycf19 | ssr2142 | 1.071 | -3.914 | -0.278 | 0.284 | -2.735 | 1.742 | 0.5878787 | 0.184 |
df_linreg_wide %>% color_table("-N")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_-N |
|---|---|---|---|---|---|---|---|---|---|
| atpA | sll1326 | -0.989 | -0.069 | 2.047 | 0.064 | 0.272 | -1.903 | 0.3425321 | 0.396 |
| sll1878 | sll1878 | -1.585 | 4.291 | 2.016 | 2.165 | 1.301 | -0.213 | 0.7988744 | 0.179 |
| sll0176 | sll0176 | -0.408 | -2.085 | -2.021 | -1.717 | -1.526 | 0.952 | 0.6551709 | 0.168 |
| slr1079 | slr1079 | 1.742 | -1.553 | -2.022 | -1.193 | -0.394 | 1.843 | 0.5565884 | 0.304 |
| sll5046 | sll5046 | 1.34 | -1.273 | -2.118 | -0.985 | -0.329 | 1.365 | 0.6360797 | 0.152 |
| ssr3532 | ssr3532 | 2.303 | -1.661 | -3.733 | -1.464 | 0.702 | 1.64 | 0.6071408 | 0.201 |
df_linreg_wide %>% color_table("+FL")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_+FL |
|---|---|---|---|---|---|---|---|---|---|
| rps15 | ssl1784 | -2.769 | 1.647 | 1.312 | 2.437 | -0.758 | -1.499 | 0.7223751 | 0.204 |
| pyrG | sll1443 | -1.738 | 0.567 | -0.111 | 2.307 | -1.107 | -0.398 | 0.6651024 | 0.212 |
| sll0217 | sll0217 | -2.418 | 1.723 | 0.954 | 2.228 | -0.286 | -1.152 | 0.6281035 | 0.258 |
| tufA | sll1099 | -1.461 | 2.041 | 1.399 | 2.218 | 0.036 | -1.078 | 0.7057584 | 0.112 |
| sll1878 | sll1878 | -1.585 | 4.291 | 2.016 | 2.165 | 1.301 | -0.213 | 0.7988744 | 0.085 |
| sll0933 | sll0933 | -2.634 | 1.271 | 0.081 | 2.115 | -0.633 | -0.702 | 0.6558952 | 0.294 |
| fus2 | sll1098 | -1.02 | 1.482 | 0.038 | 2.077 | 0.13 | -0.815 | 0.7239763 | 0.087 |
| slr0007 | slr0007 | -2.308 | 1.932 | 0.695 | 2.012 | -0.851 | -1.068 | 0.6792795 | 0.267 |
| ribC | sll0300 | -1.073 | -0.365 | -0.267 | -2.057 | -0.721 | -0.724 | 0.7973653 | 0.037 |
| entC | slr0817 | -0.511 | -0.844 | -0.456 | -2.123 | 2.558 | 0.74 | 0.9773035 | 0.009 |
| ribA | sll1894 | -0.927 | -0.39 | -0.139 | -2.37 | -0.725 | -0.64 | 0.6200967 | 0.101 |
| sll1521 | sll1521 | -0.353 | -0.754 | -0.52 | -3.993 | -0.671 | -0.342 | 0.9636366 | 0.001 |
df_linreg_wide %>% color_table("+G")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_+G |
|---|---|---|---|---|---|---|---|---|---|
| apcE | slr0335 | -0.681 | 5.071 | 0.314 | 1.093 | 3.962 | 1.65 | 0.8255851 | 0.056 |
| dgt | sll0398 | -0.463 | 0.606 | 0.269 | 1.052 | 3.782 | -0.261 | 0.9841219 | 0.000 |
| psbO | sll0427 | 0.64 | 2.707 | 0.222 | 0.633 | 3.339 | 1.65 | 0.8838887 | 0.021 |
| apcA | slr2067 | 0.027 | 4.441 | 0.576 | 0.809 | 3.301 | 2.922 | 0.7380667 | 0.142 |
| psbC | sll0851 | -0.001 | 1.691 | 0.24 | 0.281 | 3.288 | 3.75 | 0.9364716 | 0.019 |
| sll1496 | sll1496 | 1.499 | -1.248 | -0.507 | -0.657 | 3.237 | -2.22 | 0.9150619 | 0.013 |
| psbD | sll0849 | -0.861 | 2.664 | 1.002 | -0.005 | 3.231 | 3.101 | 0.8557861 | 0.072 |
| psbJ | smr0008 | 0.192 | 2.968 | 0.293 | 0.419 | 3.216 | 3.222 | 0.8923081 | 0.037 |
| slr0758 | slr0758 | -0.644 | 1.944 | 0.251 | -0.176 | 2.791 | -1.499 | 0.9317621 | 0.005 |
| sll1378 | sll1378 | -0.955 | 3.701 | -0.4 | 1.2 | 2.782 | 0.937 | 0.9545533 | 0.006 |
| slr1102 | slr1102 | 0.056 | 3.162 | -0.118 | 0.511 | 2.737 | 0.819 | 0.8207709 | 0.050 |
| sll0556 | sll0556 | 1.268 | 0.982 | 0.25 | -0.377 | 2.727 | 0.121 | 0.7859377 | 0.060 |
| slr2070 | slr2070 | 0.846 | -0.597 | -0.408 | -0.608 | 2.711 | -1.752 | 0.7416662 | 0.075 |
| cbbA | sll0018 | -0.644 | 1.164 | 0.415 | 0.545 | 2.607 | 2.054 | 0.9537621 | 0.008 |
| entC | slr0817 | -0.511 | -0.844 | -0.456 | -2.123 | 2.558 | 0.74 | 0.9773035 | 0.005 |
| cpcB | sll1577 | 0.12 | 4.032 | 0.141 | 0.74 | 2.544 | 2.476 | 0.8276584 | 0.093 |
| ssr2062 | ssr2062 | 0.5 | 0.071 | 0.167 | -0.086 | 2.528 | -1.616 | 0.9322018 | 0.004 |
| rpiA | slr0194 | -0.278 | 2.194 | 1.674 | 1.505 | 2.476 | -1.452 | 0.9605876 | 0.001 |
| slr1990 | slr1990 | -0.768 | 3.219 | 0.197 | 0.994 | 2.472 | 2.429 | 0.8920252 | 0.047 |
| trx | sll1057 | -0.491 | 1.161 | -0.106 | 0.132 | 2.417 | 0.6 | 0.9611652 | 0.003 |
| sll6055 | sll6055 | -1.122 | 3.123 | 0.558 | 0.514 | 2.406 | 2.128 | 0.8376624 | 0.094 |
| psbH | ssl2598 | 0.265 | 1.696 | 0.215 | -0.134 | 2.349 | 1.235 | 0.8920776 | 0.024 |
| cyp2 | slr0574 | -0.6 | 3.073 | 0.867 | 1.027 | 2.347 | -0.438 | 0.9444943 | 0.003 |
| ccsA | sll1513 | 0.287 | 1.628 | 0.096 | 0.428 | 2.321 | 0.814 | 0.9010260 | 0.013 |
| cpcC2 | sll1579 | -0.036 | 1.144 | 0.101 | 0.224 | 2.32 | 0.641 | 0.9783295 | 0.001 |
| cpcG | slr2051 | -0.205 | 2.621 | 0.183 | 0.472 | 2.294 | 1.485 | 0.9132484 | 0.019 |
| sll0062 | sll0062 | 0.605 | 1.181 | 0.165 | 0.639 | 2.262 | 1.004 | 0.8522771 | 0.027 |
| ndhF | slr2009 | 0.221 | 2.237 | 0.905 | 0.613 | 2.258 | 3.08 | 0.9355070 | 0.020 |
| cpcA | sll1578 | 0.11 | 3.57 | 0.261 | 0.609 | 2.255 | 2.162 | 0.8434275 | 0.079 |
| ccmK4 | slr1839 | -0.233 | 2.073 | 0.584 | 0.413 | 2.248 | 0.326 | 0.7885394 | 0.045 |
| apcB | slr1986 | 0.627 | 2.876 | 0.319 | 0.432 | 2.241 | 2.188 | 0.7619664 | 0.130 |
| sll0301 | sll0301 | 0.045 | 2.33 | -0.396 | -0.094 | 2.231 | 0.605 | 0.9108020 | 0.018 |
| slr1505 | slr1505 | -0.682 | 2.926 | 0.456 | 0.864 | 2.222 | 3.068 | 0.9263147 | 0.035 |
| hhoB | sll1427 | -0.04 | 2.017 | -0.198 | 0.237 | 2.207 | 0.158 | 0.7804217 | 0.056 |
| rub | slr2033 | 0.214 | 1.429 | 0.237 | -0.416 | 2.109 | 1.991 | 0.9937546 | 0.000 |
| petH | slr1643 | -1.23 | 2.351 | 1.048 | 1.592 | 2.039 | -2.97 | 0.7531722 | 0.085 |
| ssr0657 | ssr0657 | 0.935 | -2.358 | -0.659 | -0.992 | -2.021 | 1.496 | 0.9170055 | 0.007 |
| fabF2 | slr1332 | 0.401 | -2.251 | 0.251 | 0.187 | -2.097 | 0.38 | 0.9209766 | 0.012 |
| sll0498 | sll0498 | 0.407 | -2.577 | -1.166 | -1.108 | -2.148 | -1.967 | 0.8482665 | 0.053 |
| slr1974 | slr1974 | -1.295 | -0.202 | 0.509 | 0.64 | -2.153 | -0.861 | 0.7460569 | 0.127 |
| glcP | sll0771 | 0.113 | -0.381 | -0.143 | 0.008 | -2.164 | -1.027 | 0.9982143 | 0.000 |
| minE | ssl0546 | 0.975 | -1.768 | -1.032 | -0.849 | -2.194 | 0.18 | 0.8244200 | 0.032 |
| purA | sll1823 | -1.4 | -0.755 | 1.149 | 0.416 | -2.4 | -0.727 | 0.6987707 | 0.151 |
| glgA | sll1393 | -0.137 | -0.226 | 0.031 | -0.204 | -2.7 | 1.505 | 0.9275690 | 0.004 |
| pilT | sll1533 | -1.207 | -0.456 | -0.979 | 1.343 | -2.71 | 0.646 | 0.7070054 | 0.138 |
| ycf19 | ssr2142 | 1.071 | -3.914 | -0.278 | 0.284 | -2.735 | 1.742 | 0.5878787 | 0.226 |
| ssl3364 | ssl3364 | 0.214 | -1.289 | -0.623 | -0.271 | -3.033 | -0.61 | 0.9116291 | 0.010 |
| glk | sll0593 | -0.008 | -0.723 | -0.614 | -0.394 | -3.701 | -2.08 | 0.9860610 | 0.000 |
df_linreg_wide %>% color_table("+D")
| sgRNA_target | locus | carbon | light | -N | +FL | +G | +D | r_squared | pval_+D |
|---|---|---|---|---|---|---|---|---|---|
| psbB | slr0906 | -0.064 | 1.707 | 0.622 | -0.176 | 1.44 | 4.62 | 0.8709903 | 0.028 |
| psbE | ssr3451 | 0.191 | 1.559 | 0.07 | 0.163 | 1.443 | 3.846 | 0.9215596 | 0.012 |
| psbC | sll0851 | -0.001 | 1.691 | 0.24 | 0.281 | 3.288 | 3.75 | 0.9364716 | 0.030 |
| psbF | smr0006 | 0.201 | 1.328 | 0.045 | -0.021 | 1.078 | 3.584 | 0.9289141 | 0.009 |
| sll0364 | sll0364 | 2.834 | -2.812 | -0.108 | 1.758 | 0.601 | 3.578 | 0.8717770 | 0.072 |
| rpoDI | slr0653 | 1.855 | -0.679 | -1.12 | -1.39 | -1.275 | 3.412 | 0.5935773 | 0.155 |
| psbD2 | slr0927 | -0.005 | 1.286 | 0.228 | -0.241 | 1.923 | 3.353 | 0.9093335 | 0.031 |
| psbJ | smr0008 | 0.192 | 2.968 | 0.293 | 0.419 | 3.216 | 3.222 | 0.8923081 | 0.079 |
| psbD | sll0849 | -0.861 | 2.664 | 1.002 | -0.005 | 3.231 | 3.101 | 0.8557861 | 0.152 |
| ndhF | slr2009 | 0.221 | 2.237 | 0.905 | 0.613 | 2.258 | 3.08 | 0.9355070 | 0.018 |
| slr1505 | slr1505 | -0.682 | 2.926 | 0.456 | 0.864 | 2.222 | 3.068 | 0.9263147 | 0.031 |
| tktA | sll1070 | 0.198 | 0.59 | 0.503 | 0.822 | 1.981 | 2.992 | 0.9639206 | 0.006 |
| apcA | slr2067 | 0.027 | 4.441 | 0.576 | 0.809 | 3.301 | 2.922 | 0.7380667 | 0.289 |
| lysA | sll0504 | -0.206 | 0.251 | 0.394 | 0.088 | 0.396 | 2.767 | 0.9715674 | 0.001 |
| cpcB | sll1577 | 0.12 | 4.032 | 0.141 | 0.74 | 2.544 | 2.476 | 0.8276584 | 0.181 |
| slr1990 | slr1990 | -0.768 | 3.219 | 0.197 | 0.994 | 2.472 | 2.429 | 0.8920252 | 0.102 |
| psbL | smr0007 | 0.139 | 1.635 | 0.417 | 0.57 | 1.385 | 2.402 | 0.8022242 | 0.083 |
| apcB | slr1986 | 0.627 | 2.876 | 0.319 | 0.432 | 2.241 | 2.188 | 0.7619664 | 0.232 |
| slr0734 | slr0734 | 0.52 | 2.819 | 0.607 | 0.753 | 1.99 | 2.183 | 0.8211958 | 0.134 |
| hemA | slr1808 | 0.905 | 1.077 | -0.68 | -0.639 | 1.149 | 2.174 | 0.5761540 | 0.327 |
| cpcA | sll1578 | 0.11 | 3.57 | 0.261 | 0.609 | 2.255 | 2.162 | 0.8434275 | 0.164 |
| ftsZ | sll1633 | 2.34 | -2.314 | -1.677 | -1.439 | 0.318 | 2.136 | 0.9249829 | 0.054 |
| sll6055 | sll6055 | -1.122 | 3.123 | 0.558 | 0.514 | 2.406 | 2.128 | 0.8376624 | 0.217 |
| sll0930 | sll0930 | 0.084 | 0.116 | 0.329 | 0.368 | 1.209 | 2.055 | 0.8790537 | 0.046 |
| cbbA | sll0018 | -0.644 | 1.164 | 0.415 | 0.545 | 2.607 | 2.054 | 0.9537621 | 0.043 |
| sll0983 | sll0983 | 1.392 | -1.936 | 0.075 | -0.306 | -0.69 | 2.036 | 0.8763437 | 0.041 |
| glk | sll0593 | -0.008 | -0.723 | -0.614 | -0.394 | -3.701 | -2.08 | 0.9860610 | 0.012 |
| ndhC | slr1279 | 0.96 | -0.173 | 0.393 | -0.18 | -0.985 | -2.147 | 0.8635036 | 0.098 |
| ycf43 | sll0194 | -0.722 | -0.119 | 0.025 | 0.102 | 1.626 | -2.206 | 0.9735251 | 0.001 |
| sll1496 | sll1496 | 1.499 | -1.248 | -0.507 | -0.657 | 3.237 | -2.22 | 0.9150619 | 0.088 |
| pmgA | sll1968 | 1.162 | -1.905 | 0.387 | -1.343 | -1.769 | -2.242 | 0.5849242 | 0.429 |
| ssr2333 | ssr2333 | 0.305 | -1.552 | -0.003 | -0.034 | 0.305 | -2.319 | 0.4255312 | 0.285 |
| ssl0438 | ssl0438 | 0.12 | -1.474 | -0.18 | -0.419 | -0.219 | -2.375 | 0.8654067 | 0.019 |
| atpI | sll1322 | -1.547 | -0.422 | 1.294 | 0.662 | 0.252 | -2.475 | 0.4121488 | 0.353 |
| slr1098 | slr1098 | 0.903 | -2.261 | -0.199 | -0.434 | 0.479 | -2.543 | 0.4195152 | 0.367 |
| talB | slr1793 | 0.152 | -0.144 | -0.806 | -0.023 | 0.111 | -2.879 | 0.9813772 | 0.000 |
| petH | slr1643 | -1.23 | 2.351 | 1.048 | 1.592 | 2.039 | -2.97 | 0.7531722 | 0.066 |
| zwf | slr1843 | -0.079 | 0.297 | -0.149 | 0.054 | -0.043 | -2.998 | 0.9966699 | 0.000 |
| slr1245 | slr1245 | -2.694 | 1.826 | 1.245 | -0.805 | 1.092 | -3.224 | 0.8325233 | 0.067 |
Based on the multiple linear model correlations, we can try to extract a shortlist of the most interesting hypothetical genes. These could warrant further investigations.
list_top_unknown_hits <- df_linreg_wide %>%
left_join(df_uniprot, by = "locus") %>%
# filter by name: only unknown proteins
filter(
is.na(gene_name_short),
str_detect(protein, "[a-zA-Z]{3}[0-9]{4} protein|Uncharacterized")) %>%
# filter by effect: only correlation > 3
filter(if_any(matches("^(carb|light|\\-|\\+)"), ~ abs(.) > 3)) %>%
arrange(desc(r_squared)) %>%
pull(locus)
df_linreg_wide %>% filter(locus %in% list_top_unknown_hits) %>%
select(!starts_with("pval"), -sgRNA_target) %>%
mutate(across(2:7, ~ cell_spec(., "html", color = "white",
background = spec_color(., option = "E", scale = c(-5.5, 5.5)),
bold = TRUE))) %>%
kbl(format = "html", escape = F) %>%
kable_paper("striped", full_width = F)
| locus | carbon | light | -N | +FL | +G | +D | r_squared |
|---|---|---|---|---|---|---|---|
| sll0364 | 2.834 | -2.812 | -0.108 | 1.758 | 0.601 | 3.578 | 0.8717770 |
| sll0481 | 2.141 | -3.371 | -1.601 | -1.335 | 0.485 | 1.056 | 0.9686290 |
| sll0877 | 1.76 | -3.694 | -2 | -1.39 | -1.273 | 0.847 | 0.7839636 |
| sll1378 | -0.955 | 3.701 | -0.4 | 1.2 | 2.782 | 0.937 | 0.9545533 |
| sll6055 | -1.122 | 3.123 | 0.558 | 0.514 | 2.406 | 2.128 | 0.8376624 |
| slr1102 | 0.056 | 3.162 | -0.118 | 0.511 | 2.737 | 0.819 | 0.8207709 |
| slr1505 | -0.682 | 2.926 | 0.456 | 0.864 | 2.222 | 3.068 | 0.9263147 |
| slr1990 | -0.768 | 3.219 | 0.197 | 0.994 | 2.472 | 2.429 | 0.8920252 |
| ssl3364 | 0.214 | -1.289 | -0.623 | -0.271 | -3.033 | -0.61 | 0.9116291 |
| ssr3532 | 2.303 | -1.661 | -3.733 | -1.464 | 0.702 | 1.64 | 0.6071408 |
The list above shows the genes whose fitness is most significantly correlated with one of the treatments. This list of genes is extracted and then simply fitness per condition is plotted as a heatmap, in order to confirm the trends from fitting the multiple liner regression models.
library(ggheatmap)
plot_sgRNAs_light <- df_gene %>%
# make df with list of strongest correlated genes
filter(locus %in% list_top_unknown_hits) %>%
# reshape to wide format
mutate(wmean_fitness = wmean_fitness %>% replace(., . > 3.5, 3.5) %>% replace(., . < -3.5, -3.5)) %>%
filter(time == 0) %>% select(sgRNA_target, condition, wmean_fitness) %>%
pivot_wider(names_from = condition, values_from = wmean_fitness) %>%
column_to_rownames("sgRNA_target") %>% t %>%
# plot heatmap
ggheatmap(cluster_rows = TRUE, cluster_cols = TRUE,
color = brewer_pal(direction = -1, palette = "RdBu")(5),
show_cluster_rows = FALSE, cluster_num = c(1,1),
tree_color_cols = grey(0.5),
) %>% ggheatmap_theme(plotlist = 1, theme = list(
custom_theme() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5),
axis.title.x = element_blank(), axis.title.y = element_blank())
))
Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> = "none")` instead.
print(plot_sgRNAs_light)
save_plot(plot_sgRNAs_light, width = 8.0, height = 3.5)
Summary
sll0364 - 139 AA. KD has higher fitness in HC conditions and lower fitness in HL. Negatively regulating carbon metabolism?sll0481 - 155 AA. KD has higher fitness in +G conditions and lower fitness in HL. Membrane localization. Negatively regulating glycolysis?sll0877 - 456 AA. KD has higher fitness only in HC,LL. Mitigates light limitation?ssl3364 - 74 AA. KD has lower fitness on all HC/+G conditions. This protein is known as CP12 protein, regulating glycolytic flux at GAPDH and PRK.ssr3532 - 80 AA. KD lower fitness on N-limitation and C-limitation (LC-HL combinations). Same operon as glutaminase glsA (slr2079, catalyzes deamination of gln –> glu), regulatory, involved in N metabolism?slr1990 - 240 AA, 5 TM domains. KD higher fitness in photoheterotrophy, lower fitness in all HC/LL conditions. Something important for photosystems? Something that wastes e- in photoheterotrophic conditions?sll6055 - 152 AA. Fitness profile as above. Multiubiquitin domain, involved in protein modification/degradation of PS proteins?slr1505 - 198 AA. Fitness profile as above. No useful information.sll1378 - 300 AA. KD has lower fitness on all LL conditions. Membrane associated protein? In STRING, potential interaction with PbsA1 and PbsA2 (Heme oxygenase 1 and 2). Potentially important for chlorophyll or heme biosynthesis –> would explain importance for photosynthesis in LL condition.slr1102 - 853 AA. KD has lower fitness on all LL conditions. 4 known domains, FHA (forkhead-associated domain is a phosphopeptide recognition domain found in many regulatory proteins), PAS (signaling, often involved in circadian proteins, detect their signal by way of an associated cofactor like heme, flavin), GGDEF (involved in signal transduction, likely to catalyze synthesis or hydrolysis of cyclic diguanylate c-diGMP), EAL (shown to stimulate degradation of a second messenger, cyclic di-GMP, candidate for a diguanylate phosphodiesterase function. Together with the GGDEF domain, EAL might be involved in regulating cell surface adhesiveness in bacteria). Source: InterPro. Embedded in a tight network of interacting proteins all involved in chromophore biosynthesis/maturation.Apc and cpc repression mutants encoding phycobilisomes are also enriched in high light
plot_sgRNAs_phycobil <- df_gene %>%
filter(str_detect(gene_name, "[ac]pc"), time == 0) %>%
mutate(wmean_fitness = wmean_fitness %>% replace(., . > 4, 4) %>% replace(., . < -4, -4)) %>%
ggplot(aes(x = sgRNA_target, y = condition, fill = wmean_fitness)) +
geom_tile() + custom_theme() +
labs(title = "Apc/Cpc mutants enriched in high light/CO2", x = "", y = "") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
scale_fill_distiller(palette = "RdBu", limits = c(-4, 4))
print(plot_sgRNAs_phycobil)
save_plot(plot_sgRNAs_phycobil, width = 6.5, height = 3.5)
We can plot selected conditions against each other and add gene labels in order to find or confirm particular patterns.
make_fitness_plot <- function(data, vars, title = NULL) {
# prepare data for two variables each
data %>% ungroup %>%
filter(condition %in% vars, sgRNA_type == "gene") %>%
select(locus, sgRNA_target, condition, wmean_fitness) %>% distinct %>%
pivot_wider(names_from = condition, values_from = wmean_fitness) %>%
mutate(
dfit = get(vars[1]) - get(vars[2]),
significant = !between(dfit, quantile(dfit, probs = c(0.003)),
quantile(dfit, probs = c(0.997))),
sgRNA_target = if_else(significant, sgRNA_target, "")) %>%
# plot
ggplot(aes(x = get(vars[1]), y = get(vars[2]),
color = significant, label = sgRNA_target)) +
geom_point(size = 1) + custom_theme(legend.position = 0) +
geom_abline(intercept = 0, slope = 1, col = grey(0.5), lty = 2, size = 0.8) +
geom_abline(intercept = 4, slope = 1, col = grey(0.5), lty = 2, size = 0.8) +
geom_abline(intercept = -4, slope = 1, col = grey(0.5), lty = 2, size = 0.8) +
geom_text_repel(size = 3, max.overlaps = 50) +
labs(title = title, x = vars[1], y = vars[2]) +
coord_cartesian(xlim = c(-9, 5), ylim = c(-9, 5)) +
scale_color_manual(values = c(grey(0.5), custom_colors[2]))
}
# browse through all possible condition combinations;
# we need a helper function that detects duplicated combinations
duplicated_2vec <- function(x, y) {
xy = paste(x, y); yx = paste(y, x)
sapply(xy, function(xval) {
which(xval == yx) <= which(xval == xy)
})
}
list_condition_pairs <- lapply(
unique(df_gene$condition) %>% expand_grid(x = ., y = .) %>%
filter(!duplicated_2vec(x, y)) %>% t %>% as.data.frame %>% as.list,
function(var) {
make_fitness_plot(df_gene, vars = var,
title = paste(var, collapse = " - "))
}
)
# export images
invisible(capture.output(
lapply(list_condition_pairs, function(pl) {
pl_name <- paste0("../figures/pairwise_comparisons/plot_", pl$labels$x, "_", pl$labels$y, ".png")
png(filename = pl_name, width = 800, height = 800, res = 120)
print(pl)
dev.off()
})
))
# example of first 4 combinations
list_condition_pairs[1:4]
$V1
$V2
$V3
$V4
To plot gene fitness for the enzymes of central carbon metabolism, we use the complete list of enzymes and the genes that they are mapped to (obtained from KEGG). We can extract gene sets for specific pathways and plot fitness. We start with glycolysis and Calvin cycle enzymes.
list_central_met_pathways <- c(
"Glycolysis / Gluconeogenesis",
"Pentose phosphate pathway",
"Carbon fixation in photosynthetic organisms",
"Photosynthesis",
"Photosynthesis - antenna proteins",
"Citrate cycle (TCA cycle)",
"Pyruvate metabolism",
"Glyoxylate and dicarboxylate metabolism"
)
plot_gene_fitness <- function(df, pw = NULL, gene = NULL, title = NULL) {
df <- df %>% filter(time == 0)
if (!is.null(pw)) {
df <- df %>% inner_join(df_kegg %>% filter(kegg_pathway == pw) %>% select(locus),
by = "locus")
title <- pw
} else if (!is.null(gene)) {
df <- df %>% filter(locus %in% gene)
}
ggplot(df, aes(x = condition, y = wmean_fitness,
ymin = wmean_fitness-sd_fitness,
ymax = wmean_fitness+sd_fitness, fill = condition, color = condition)) +
geom_col(position = "dodge", width = 0.6) +
geom_errorbar(position = "dodge", width = 0.6, size = 1) +
custom_theme(aspect.ratio = 1,
legend.position = "bottom", legend.key.size = unit(0.4, "cm")) +
labs(title = title) +
theme(axis.text.x = element_blank()) +
scale_fill_manual(values = colorRampPalette(custom_colors[1:5])(11)) +
scale_color_manual(values = colorRampPalette(custom_colors[1:5])(11)) +
facet_wrap(~ sgRNA_target, ncol = 8)
}
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[1]]))
ggsave("../figures/plot_fitness_glycolysis.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[1]]),
width = 8, height = 6)
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[2]]))
ggsave("../figures/plot_fitness_pentose.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[2]]),
width = 8, height = 5)
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[3]]))
ggsave("../figures/plot_fitness_carbonfix.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[3]]),
width = 8, height = 5)
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[4]]))
ggsave("../figures/plot_fitness_photosys.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[4]]),
width = 8, height = 11)
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[5]]))
ggsave("../figures/plot_fitness_antenna.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[5]]),
width = 8, height = 4)
print(plot_gene_fitness(df_gene, pw = list_central_met_pathways[[6]]))
ggsave("../figures/plot_fitness_citrate.svg",
plot_gene_fitness(df_gene, pw = list_central_met_pathways[[6]]),
width = 8, height = 4)
Custom set of stress related genes
OCP (slr1963), pgr5 (ssr2016), Flv1 (sll1521), Flv2 (sll0219), Flv3 (sll0550), Flv4 (sll0217), sll0218 (in flv2/4 operon), SigB (sll0306), SigC (sll0184), SigD (sll2012), SigE (sll1689).
list_stress_genes <- c("slr1963", "ssr2016", "sll1521",
"sll0219", "sll0550", "sll0217", "sll0218", "sll0306", "sll0184",
"sll2012", "sll1689")
plot_gene_fitness(df_gene, gene = list_stress_genes, title = "stress related genes")
ggsave("../figures/plot_fitness_stress.svg",
plot_gene_fitness(df_gene, gene = list_stress_genes, title = "stress related genes"),
width = 8, height = 4)
Fitness plot for carbon transports and carboxysome shell
slr1512 sbtA, slr1513 sbtB, sll1734 cupA, slr1302 cupB, sll0359 cyabrB1, sll0822 cyabrB2, sll1594 ccmR, sll1031 ccmM, sll1028 ccmK2, sll1029 ccmK1, sll1032 ccmN, slr0436 ccmO, sll1030 ccmL.
list_carboxysome_genes <- c(
"slr1512", "slr1513", "sll1734", "slr1302", "sll0359", "sll0822", "sll1594",
"sll1031", "sll1028", "sll1029", "sll1032", "slr0436", "sll1030"
)
plot_gene_fitness(df_gene, gene = list_carboxysome_genes,
title = "Carboxysome and carbon transporters")
ggsave("../figures/plot_fitness_carboxysome.svg",
plot_gene_fitness(df_gene, gene = list_carboxysome_genes,
title = "Carboxysome and carbon transporters"),
width = 8, height = 4)
Genes whose KD leads to increased fitness
list_genes_pos_fitness <- df_gene %>%
filter(time == 0, !is.na(locus), wmean_fitness > 2) %>%
pull(locus) %>% unique
plot_gene_fitness(df_gene, gene = list_genes_pos_fitness, title = "Genes with increased fitness (f > 2)")
ggsave("../figures/plot_fitness_increased.svg",
plot_gene_fitness(df_gene, gene = list_genes_pos_fitness, title = "Genes with increased fitness (f > 2)"),
width = 8, height = 8)
Summary: - pmgA is once again the gene with strongest and most widespread fitness increase, validating results from library V1 - slr1916 same phenotype as pmgA just weaker. We also know this one from before. Must have identical role as pmgA. - all PSII genes show increased fitness in photoheterotrophic condition –> PS is a burden here - sll0689, pxcA, slr1609 - all increased fitness in HC,HL, first two are Na+/CO2 (?) trnasporters, slr1609 we know from before, annotated as fatty acid CoA ligase, but probably it’s something different - sll6055, slr1505, slr1990 - all increased fitness in photoheterotrophic condition, and decreased fitness in HC/LL conditions. Not much is known about these genes, probably a role in photosynthesis, as the pattern is similar to psb genes (PSII maturation?) - slr0813, slr0907, slr909, slr1299 - all increased fitness in HC/LL. Not clear what connects these genes functionally.
Export a summary table of all genes and conditions, so that it’s easy for other people to look up single conditions as for example done in one-by-one fitness comparisons. This is best done in wide format (one column per condition).
df_gene %>% ungroup %>%
filter(sgRNA_type == "gene") %>%
select(locus, sgRNA_target, gene_name, condition, wmean_fitness) %>%
distinct %>%
pivot_wider(names_from = condition, values_from = wmean_fitness) %>%
write_csv("../data/output/fitness_summary.csv")
df_gene %>%
filter(sgRNA_type == "gene") %>%
write_csv("../data/output/fitness_genes.csv")
df_kegg %>% write_csv("../data/output/kegg_annotation.csv")
sessionInfo()